A data-based large-scale model for primary visual cortex enables brain-like robust and versatile visual processing

We analyze visual processing capabilities of a large-scale model for area V1 that arguably provides the most comprehensive accumulation of anatomical and neurophysiological data to date. We find that this brain-like neural network model can reproduce a number of characteristic visual processing capabilities of the brain, in particular the capability to solve diverse visual processing tasks, also on temporally dispersed visual information, with remarkable robustness to noise. This V1 model, whose architecture and neurons markedly differ from those of deep neural networks used in current artificial intelligence (AI), such as convolutional neural networks (CNNs), also reproduces a number of characteristic neural coding properties of the brain, which provides explanations for its superior noise robustness. Because visual processing is substantially more energy efficient in the brain compared with CNNs in AI, such brain-like neural networks are likely to have an impact on future technology: as blueprints for visual processing in more energy-efficient neuromorphic hardware.

: Performances on the 5 tasks are not very sensitive to the amplitude of internal noise in the network. The noise level used in this study is q = 2, s = 2. All tasks were tested on testing dataset.   Same images were presented twice with different background noise SVD Same images

Fine orientation discrimination
LGN Figure S2: Schematic diagram of cvPCA (cross-validated PCA) to extract signal and noise according to (11). (A) Correlations (r) of neural responses for two presentations of the same natural image, using our data-driven noise model with s = q = 2 and projected onto selected principal components (PC  Figure S4: Adding voltage reset does not significantly affect the spike response of the GLIF3 model. (A) Stimulus in form of a step function (top) was inputted to the GLIF3 model with voltage reset used in this study (middle) and the GLIF3 model without voltage reset in (35). The after-spike currents in the GLIF3 model cause a spike-frequency adaptation, i.e., they reduce the firing rate during the later part of the input current injection. (B-C) Same as in (A) but for two different intensities of step stimuli. The difference between two models is so small that it can be ignored. Henceforth, we call our modified GLIF3 model with voltage reset also as GLIF3 model for convenience. , ii) data-driven noise model when s = q = 2, iii) no injected noise (No noise). The test accuracy is not sensitive to them. We used the data-driven noise model with s = q = 2 during the testing process.
(B) Decay of accuracy of the V1 model trained with three different noise scenarios for increasing amplitudes of internal noise, see Fig. 3E for a preceding analysis for the V1 model. One sees that these noise scenarios do not dramatically affect the noise robustness of the V1 model.  Red and blue dots represent the spikes of excitatory and inhibitory neurons, respectively. Note that the spike and membrane potential of the model was reset to 0 after one classification was done (separated by the thick black line). (C) Spike raster of readout neurons. 10% of neurons are sampled in every readout population. Color codes of panels are the same as in Fig. 2A. From the top to bottom, there are readout populations of the fine orientation discrimination, the image classification, the visual change detection of nature images and gratings, and the evidence accumulation tasks.     Figure S11: Distribution of recurrent weights between each pair of populations before (light blue) and after learning (dark blue) the 5 tasks. Each row represents a pre-synaptic neuron population, and each column represents a post-synaptic neuron population. The histogram represents the distribution of synaptic weights of all synaptic connections that share the same pre-synaptic and post-synaptic neuron population. Vertical axis in each panel is log-scale. Horizontal axis is linear scale and horizontal range is from the smallest value to the largest value of each population. The number is 1 − D where D is from the Kolmogorov-Smirnov test, quantifying the similarity between distributions (1). Exc., excitatory neurons. Figure S12: Spatial clustering of orientation tuning of excitatory neurons in L2/3 of the V1 model before and after training. The correlation of tuning curves of two neurons as a function of the horizontal distance between them. Firstly, we calculated the spike counts of L2/3 excitatory neurons in 100 ms when the model received gratings with different orientation (0 to 180 • with the step of 5 • ). Then we calculated the correlation between the tuning curves of neuron pairs and plotted the correlation coefficient as the function of the horizontal distance between the neuron pairs. The spatial cluster is weaker than the experimental results in Fig. 2b of (22), probably because they calculated the correlation based on the joint tuning of neurons in the orientation and spatial frequency plane but we did not use the spatial frequency (we do not have a related task).

V1 model trained with different initializations
Billeh initialization Random Gaussian initialization Figure S14: The effect of initialization of synaptic weights on the testing accuracy. The accuracy of V1 model trained with initialization of synaptic weights from (1) (Billeh initialization) and random Gaussian initialization whose mean and SD are equal to the Billeh initialization. RSNN with E/I distinction (control model 5) Figure S16: Comparing the eigenspectra of neural codes in the V1 model and control models. (A-F) Eigenspectra of V1 model and 5 control models. In control models except the one without LGN (control model 1), their power-law fittings are worse than that of V1 model. Either head or tail cannot be fitted, making the power-law span about one order of magnitude smaller than that of V1 model. Moreover, the fitted power-law exponent is more away from the experimental data (1 + 2/d). Evidence accumulation E chance level Figure S18: Excitatory neurons in L2/3 contain and transmit most of the information that is needed to solve the 5 computational tasks. To demonstrate that, we trained a multi-layer perceptron (which might be seen as proxy for higher cortical areas that are not represented in the model) that receives the spike counts during a time window of 50 ms from all excitatory neurons in L2/3 of the trained V1 model. We label the response window for each task as 0, and also show results for 50 ms time windows before that (counting backward). The weights of the multi-layer perceptron were trained by Adam optimizer to produce the target output for each task. The multi-layer perceptron has 3 layers, with 3000 and 600 ReLu neurons on the first 2 layers, and the number of neurons on the last layer equal to the number of possible decisions for the 5 tasks. The number of training epochs is 6; the learning rate is 0.0001. (A-E) The readout network can transform the firing activity of L2/3E neurons into the target outputs for all 5 tasks with high accuracy. This proves that the firing activity of the L2/3E neurons contains most of the information needed for solving the 5 tasks.

Supplementary Note 1
The differences of readout scenarios can explain why behavioral performance lags behind neural coding fidelity in area V1. The behavioral discrimination threshold for orientations in the mouse V1 was according to (4) almost 100 times larger than the discrimination threshold which they inferred from neural coding fidelity of populations of 50,000 neurons in area V1 of the mouse. They conjectured that this difference was caused by the limitations of downstream decoders. The V1 model suggests a further factor that is relevant. Direct measurements of coding fidelity based on simultaneous recordings from 50,000 neurons do not account for the fact that their information content has to be extracted by neurons in V1 that project to downstream areas. These simultaneous recordings are conceptually related to the postulate of having a global readout that receives synaptic input from all 50,000 neurons, see Fig. S3A. However, we have demonstrated in the V1 model that a global linear readout attains for the fine orientation discrimination task an accuracy of 98.18% (the global linear readout layer is trained for 6 epochs with the frozen V1 model whose synaptic weights were trained in our default manner, simulating that the readout from the recorded neurons). On the other hand, a pool of 30 projection neurons on L5 could only achieve an accuracy of 93.17% if one assumed that they were localized closely together (Fig. S3B), and of 94.01% if they were assumed to be randomly distributed in L5 (Fig. S3C). These results suggest that how information from area V1 is extracted and projected to downstream areas is also a limiting factor that is likely to contribute to the gap between the performance of an ideal observer of neural activity in V1 and the behavioral performance of mice.